-
-
Notifications
You must be signed in to change notification settings - Fork 651
[client] Feature/lazy connection #3379
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Fix thread handling
Co-authored-by: hakansa <[email protected]>
fix idle state managmenet Validate the threshold env variable If the remote peer initiates the connection, we just reset the inactivity listener, but do not start it. As a result, after a disconnection, our peer never switches back to the idle state. For every peer, we instantiate an inactivity listener. It is a reusable timer, but it must start on a go routine after it has been stopped. The key challenge is to ensure that we never start the timer multiple times and always clean it up properly when the peer is in the idle state or it has been set to the exclusion list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR description doesn't match code anymore (env vars)
if d.isClosed { | ||
d.peerCfg.Log.Debugf("exit from activity listener") | ||
} else { | ||
d.peerCfg.Log.Errorf("failed to read from activity listener: %s", err) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we not retry (continue) reading if there's an error != closed? Maybe with a ctx check in-between
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With it, we risk a high load loop. If it fails to read from UDP once, what makes us think it will succeed the second time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With it, we risk a high load loop.
True
If it fails to read from UDP once, what makes us think it will succeed the second time?
It could be a temporary error, e.g. OS ran out of file descriptors.
But yeah, in general this should be rare. Maybe we can add a comment here explaining the implication for the time being?
Add an option to the management proto to enable or disable the lazy connection feature on the peer side. Command-line options and environment variables override the management configuration. - When the feature is enabled: Start the lazy connection manager and add all peers to the manager in the "active" st-ate. - When the feature is disabled: Stop the lazy connection manager and establish connections to all peers in the store. Fix deadlock The conn_mgr is protected by a mutex. However, because some internal functions are called via callbacks from the lazy connection manager, a deadlock can occur. The solution is to completely eliminate the callbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds an experimental “lazy connection” feature allowing on-demand WireGuard peer connections and refactors related engine and stats handling.
- Introduce
ConnMgr
to manage lazy vs. permanent connections based on traffic or feature flag - Integrate
ConnMgr
into the engine startup/stop flows and expose CLI/config flags - Refactor WG stats interfaces to return a map of stats and update parsing logic
Reviewed Changes
Copilot reviewed 60 out of 60 changed files in this pull request and generated 4 comments.
Show a summary per file
File | Description |
---|---|
client/internal/lazyconn/activity/listener.go | UDP listener for on-demand connection trigger |
client/internal/lazyconn/activity/listen_ip_linux.go | Linux build listen IP definition |
client/internal/lazyconn/activity/listen_ip.go | Default listen IP for other platforms |
client/internal/iface_common.go | Change GetStats to return map[string]WGStats |
client/internal/engine_test.go | Update tests to initialize and start ConnMgr |
client/internal/engine.go | Wire up feature flag, engine start/stop, and peer flows |
client/internal/debug/debug.go | Add LazyConnectionEnabled to debug bundle output |
client/internal/connect.go | Propagate lazy-connection flag into EngineConfig |
client/internal/conn_mgr.go | New ConnMgr implementing lazy connection logic |
client/internal/config.go | Read/apply LazyConnectionEnabled in config |
client/iface/iface.go | Update GetStats signature to return map |
client/iface/device/interface.go | Update GetStats signature to return map |
client/iface/configurer/usp_test.go | Adapt tests for new stats parsing function |
client/iface/configurer/usp.go | Refactor stats parsing into parseTransfers & helper funcs |
client/iface/configurer/kernel_unix.go | Update stats method to return map of WGStats |
client/cmd/up.go | Set lazy flag for up command |
client/cmd/status.go | Extend status filter options to include `idle |
client/cmd/root.go | Add --enable-lazy-connection CLI flag |
client/internal/dns/wgiface_windows.go | Remove outdated GetStats(peerKey) from DNS iface |
client/internal/dns/wgiface.go | Remove outdated GetStats(peerKey) from DNS iface |
Comments suppressed due to low confidence (2)
client/internal/lazyconn/activity/listener.go:33
- [nitpick] The error message has a grammatical issue: replace "failed to creating" with "failed to create".
return nil, fmt.Errorf("failed to creating activity listener: %v", err)
client/iface/configurer/usp.go:287
- [nitpick] The local variable
TxBytes
begins with an uppercase letter, which is typically reserved for exported identifiers. Rename it totxBytes
to follow Go naming conventions.
TxBytes, err := toBytes(key[1])
for { | ||
n, remoteAddr, err := d.conn.ReadFromUDP(make([]byte, 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Allocating a new buffer on each ReadFromUDP
call can be inefficient; consider reusing a single []byte
buffer declared outside the loop.
for { | |
n, remoteAddr, err := d.conn.ReadFromUDP(make([]byte, 1)) | |
buffer := make([]byte, 1) // Reusable buffer for reading packets | |
for { | |
n, remoteAddr, err := d.conn.ReadFromUDP(buffer) |
Copilot uses AI. Check for mistakes.
@@ -481,7 +482,7 @@ func connectToSignal(ctx context.Context, wtConfig *mgmProto.NetbirdConfig, ourP | |||
return signalClient, nil | |||
} | |||
|
|||
// loginToManagement creates Management Services client, establishes a connection, logs-in and gets a global Netbird config (signal, turn, stun hosts, etc) | |||
// loginToManagement creates Management ServiceDependencies client, establishes a connection, logs-in and gets a global Netbird config (signal, turn, stun hosts, etc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The comment refers to a "Management ServiceDependencies client", which is unclear. Consider restoring "Management Services client" or clarifying what "ServiceDependencies" refers to.
// loginToManagement creates Management ServiceDependencies client, establishes a connection, logs-in and gets a global Netbird config (signal, turn, stun hosts, etc) | |
// loginToManagement creates Management Services client, establishes a connection, logs-in and gets a global Netbird config (signal, turn, stun hosts, etc) |
Copilot uses AI. Check for mistakes.
|
Describe your changes
With the lazy connection feature, the peer will connect to target peers on-demand. The trigger can be any IP traffic.
This feature can be enabled with the NB_ENABLE_EXPERIMENTAL_LAZY_CONN environment variable.
When the engine receives a network map, it binds a free UDP port for every remote peer, and the system configures WireGuard endpoints for these ports. When traffic appears on a UDP socket, the system removes this listener and starts the peer connection procedure immediately.
Key changes
netbird status -d
commandConnection states
Idle: The peer is not attempting to establish a connection. This typically means it's in a lazy state or the remote peer is expired.
Connecting: The peer is actively trying to establish a connection. This occurs when the peer has entered an active state and is continuously attempting to reach the remote peer.
Connected: A successful peer-to-peer connection has been established and communication is active.
Issue ticket number and link
Checklist